To get things started, let’s analyze the data across each Borough from 2013-2015. Analyzing at the Borough level gives us a high level view before we do more in-depth analysis at the school level in the latter sections.
The percentage seems to peak around 27% for Level 1 students, around 36% for Level 2 students, around 23% for Level 3 students, and 7% for Level 4 students. All the distributions seem to be roughly normal, especially for the Level 1, 2, and 3 percentages.
It looks like females’ mean score begin to overtake males’ mean score around 303. Males peak at around 297 while females peak around 307. Females minimum mean score seems to fall around 283. Whites and Asians’ mean scores begin to become a higher proportion of their students around 310 where it begins to peak and slowly decreases as the mean score amount get increases. A large proportion of Black and Hispanics’ mean scores fall between 280 and 300. Black Students’ mean score maxes out around 295. Hispanic Students’ mean score maxes out around 305. EP students seem to peak around the same mean score as females do. ELL students are somewhat evenly spread between the mean scores’ 250 and 290. There seems to be no representation of ELL students past 292.
Since most of the variables within the dataset are categorical variables, Let’s analyze how the plots over gender, and race. First we generate the histograms for each level for females.
Each level seems to have somewhat of a normal distribution with the mean value being around 25% for level 1, 36% for level 2, 22% for level 3, and 12% for level 4. Let us perform some summary statistics to see if the mean and median values indicate a normal distribution.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.50 23.20 32.30 34.80 44.32 87.10
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.8 30.1 34.6 34.2 38.6 65.9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 13.80 19.60 19.71 25.50 52.70
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 3.70 8.20 11.29 16.33 59.00
The summary statistics do indeed show a normal distribution. My estimates for the level 3 and level 4 mean values were fairly close to the exact values as well. Let’s look at histograms for males across the different levels.
The distributions are roughly the same as the female distributions. Let us see if the summary statistics tell a different story.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.50 23.20 32.30 34.80 44.32 87.10
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.8 30.1 34.6 34.2 38.6 65.9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 13.80 19.60 19.71 25.50 52.70
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 3.70 8.20 11.29 16.33 59.00
The summary statistics seem to agree with my inference. Let’s look at histograms by race starting with African Americans.
We can see that there are high counts of observations in the level 1 and 2 plots where the mean is almost 50% for African Americans in level 1.Let’s see how the data for Hispanics looks like.
The plot are very similar to those of African Americans.Let’s look at histograms for White students
There is a noticable decrease in the distribution of White students who fall in the level 1 and 2 category and a larger distribution of level 3 and 4 students. Finally, we look at histograms for Asian students
Asian students have a much lower distribution in the level 1 category, but their level 2 and 3 distributions are roughly equivalent to those of white students. Where asian students really stand out, is in the level 4 category where there are observations where 60% of the asian students were in the level 4 category.
It looks like Queens has the highest Median mean scores across all the boroughs. Now we want to analyze scores for each subject for all students:
From the plots, The Bronx has significantly lower test scores then the other boroughs. Staten Island has the highest median mean score in ELA while Queens has the highest median mean score in Mathematics. We now have a look at the scores by Gender across all boroughs.
The plots show that Females outperform the males in both Math and ELA.
The difference in median scores between males and females is significant. Now let’s look at the breakdown by race across all boroughs.
The data shows that Asian students outperform all other races in all categories.
The median score for Asian students is at least 10 points higher than all other races. It seems that Black and Hispanic students are having significantly lower test scores than other races. Next we will look at the breakdown based on English speaking students and English Language learning students.
The data shows that English proficient students had the highest median scores which is expected. Given the data, it seems that the barrier of not speaking English is reflective on the test scores outcome. Former English Language Learners seem to do almost as good as English Proficient students.
I found it interesting that although Queens has relatively the same proportion of Level 2,3, and 4 students, they had the highest median of mean value scores across all the boroughs.
It seems that Females had a lower proportion of Level 1 students and a higher proportion of level 2, 3, and 4 students than males.
Black students had very large number of Level 1 and 2 students with almost 50% of them Level 1. Asian students had the lowest median proportion of level 1 and 2 students and the highest median proportion of level 3 and 4 students.
White students had a lower median proportion of level 1 and 2 students in respect to Black and Hispanic students, but had a relatively equal proportion of Level 2 students as well.
Let’s have a look at the breakdown of the various levels given the data at the school level.
Level 1 distribution is fairly uniform between 25-50% and tapers off afterwards. It still has somewhat of a uniform distribution. Level 2 distribution is normal with the median value around 37%. The level 3 distribution median’s value seems to falls around 13% it seems. The level 4 distribution is long-tailed. To get a better view, we transform the histogram to use the log scale.
Although the largest portion of schools have 0% of students who tested at level 4, there are still some good portion of other schools that had a substiantial percentage of their students who tested at level 4 as well. It is a lot easier to see the actual distribution with the log scale. For all the other histograms , we generate the level 4 percentage histograms with the log scale transformation. Nex we look at histograms for Females:
The distributions for the females are roughly the same as the distributions for all students. There are a large number of 0% entries for levels 1 and 3. Let’s look at histograms for Males:
The distributions are the roughly the same as females. There is not much different insight than what was shown in the borugh univariate analysis; the data is just more fine-grained. Let’s see what the data tells us from the bivariate analysis.
I thought that it would be good to add a field for which borough the school resides in so I wrote a function to perform that action. This is done based on the letter in the DBN number for the school: “M” is for Manhattan, “K” is for Brooklyn, “X” is for the Bronx, “Q” is for Queens, and “R” is for Staten Island.
Since the percentage of level 1,2,3,4 students is the only continuous value field that is in this dataset, I thought it would be fitting for Bivariate Analysis. The first plot is the Mean Score vs. Level1Percentage.
Since there is a lot of over plotting, we use the alpha parameter to reduce it.
Looking at the Level 1 Percentage plot, we see some interesting trends at the extremes (0 percent and 100 percent). Schools that tested 0 percent of Level 1 students had a mean score range from 300 to 375 whereas schools that tested 100 percent of Level 1 students had a mean score range from 225 to 275. Most schools have Level 1 students within the range of 0 and 70 percent. There seems to be a strong linear relationship with the number of Level 1 students tested and mean score for the schools in the NYC area.
The plot for the Mean Score vs. Level 2 percentage is pretty interesting. It generates an arrow-shaped plot. There is huge variation in the range of Mean scores for schools who tested students with a Level 2 Percentage of 25 or less. After the 25 percent threshold, the mean scores for schools seems to converge to 300.
The plot for the Mean Score vs. Level 3 percentage shows a closer linear relationship. Most schools tested students with Level 3 percentages in the range of 3 to about 35 percent. The mean scores within this range fall mostly between 275 and 325. After the 35 percent threshold, the mean scores continue to rise but the number of schools with higher percentages of Level 3 students slowly diminishes.
Most schools have students with Level 4 between 1 and 25 percent. Even at low number of Level 4 percentages, most mean scores are above 300. We see that the number of Level 4 students slowly diminishes after 25 percent but the mean scores steadily increases reaching mean scores as high as 380.
Now that we have a good field for the data for the past two years, lets wrangle the older data and combine it with the most recent data to see if we can see any trends. While looking at the data from 2006-2012, I noticed that the mean scores were higher in the older data than in the 2013-2015 Data. After doing some searching on the web, I found this site. The documents here show that the score scales vary year over year, so using the mean scale score will not be very useful when combining data from 2006-2012 with data from 2013-2015.
The median value for average mean percentages are roughly equivalent for Level 1, 2, and 3 students for all boroughs. The median value for average mean percentages for Level 4 students are equivalent for Bronx, Brooklyn, and Manhattan, whereas the median value for average mean percentages for Level 4 students are roughly equal for Queens and Staten Island.
These plot support the claim that 2009 was the best year amongst them all showing lower median average percentages for Level 1 and 2 students and higher median average percentages for Level 3 and 4 students. Level 1 and 3 plots show the most drastic changes across the years. In the Level 1 plot, the median values decrease over 2006-2009, then increased and remain constant from 2010-2012 and then increases sharply and remains constant from 2013-2015. The inverse trend is shown for Level 3 students.
The median value across grades is roughly equivalent across all grades with a slight lower value for grades 3-5 for Level 1 and 2 students and slightly higher value for grades 3-5 for Level 3 and 4 students.
After combining all the data from 2006-2015, we have over 630,000 observations. First we examine the Level1Percentage vs. Number.Tested with aesthetic coloring of the Boroughs.
Looking at the graph, I notice a large percentage of observations for the Bronx and Brooklyn residing between 30 and 80 percent. These observations also seem to fall less than 200 students tested. Observations for Queens and Staten Island seem to fall below 25 percent when the number of students range from 100 to 500. It is tough to analyze observations for Manhattan in this plot. To get a better view, we will generate a plot with only observations for Manhattan
The plot for Manhattan looks roughly the same as the original: high percentages of Level 1 students for those tested below 200 and lower percentages for those tested more than 200. Let’s see how the observations are distributed
##
## Bronx Brooklyn Manhattan Queens Staten Island
## 114623 159787 88999 124531 25698
Given the number of observations that belong to Manhattan, it seems that its lack of points in the original scatterplot is due to overplotting.
Let’s do a facet wrap on the data based on Borough to see if this trend is across all boroughs.
It seems that across all boroughs, the percentage of Level 1 students decrease after the number of students tested passes a specific threshold. My suspicion is that the number of Level 1 students remains constant, but since the population size of students tested increases, they become a smaller percentage.
Let’s see if the data supports this hypothesis by comparing the Level1Count vs Number.Tested faceted over Boroughs
My hypothesis is somewhat true, but we can see in Brooklyn and Queens that the number of Level 1 students decrease as the number of students tested increases. Let’s see how the same plot faceted over years looks like:
The trend seems to be the same from 2006-2012 with a consistent decrease in Level 1 students’ percentage from 2006-2009 for population sizes less than 200. From 2013 to 2015, there are larger percentages of Level 1 students across a wider range of student populations. This could be due to the change in NY Testing to be more Common Core aligned in 2013 as mentioned in the Notes of the dataset. Let’s analyze the Student Level percentages across each borough for each grade. Given that “Number.Tested” is one of the few continuous scale metrics in this dataset, we use this so that we can a have more randomly distributed plots (versus the discrete values like “Year” and “Grade”).
## school_data$Grade: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.90 11.90 18.76 28.60 100.00
## --------------------------------------------------------
## school_data$Grade: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.80 9.90 16.71 23.90 100.00
## --------------------------------------------------------
## school_data$Grade: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.70 9.10 17.58 26.30 100.00
## --------------------------------------------------------
## school_data$Grade: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.40 11.80 19.38 29.50 100.00
## --------------------------------------------------------
## school_data$Grade: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.20 12.00 21.31 33.30 100.00
## --------------------------------------------------------
## school_data$Grade: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 3.30 12.50 20.78 32.00 100.00
For Level 1 students, there seems to be a higher concentration of Bronx schools with a high percentage level 1 students in Grades 4 and 5, but seems to diminish in grades 7 and 8. In grades 3-5, the distribution seems fairly uniform across the number.tested. In contrast, for grades 6-8, there seems to be a decreasing trend in percentage of level 1 students as the number of tested increases.
## school_data$Grade: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 16.40 29.20 28.39 39.70 100.00
## --------------------------------------------------------
## school_data$Grade: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 17.40 30.90 30.52 42.60 100.00
## --------------------------------------------------------
## school_data$Grade: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 17.60 30.30 30.21 41.70 100.00
## --------------------------------------------------------
## school_data$Grade: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 23.60 37.50 36.57 50.00 100.00
## --------------------------------------------------------
## school_data$Grade: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 23.30 36.90 36.46 50.00 100.00
## --------------------------------------------------------
## school_data$Grade: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 28.60 41.70 40.88 53.80 100.00
For Level 2 students, there seems to be large proportion around the 50 percent mark across all grades.
## school_data$Grade: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 23.80 40.30 40.59 57.10 100.00
## --------------------------------------------------------
## school_data$Grade: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 22.20 37.90 38.53 54.20 100.00
## --------------------------------------------------------
## school_data$Grade: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 22.20 38.10 38.45 54.25 100.00
## --------------------------------------------------------
## school_data$Grade: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 14.30 30.40 32.84 50.00 100.00
## --------------------------------------------------------
## school_data$Grade: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 14.30 30.70 33.07 49.10 100.00
## --------------------------------------------------------
## school_data$Grade: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 13.50 28.90 31.23 46.30 100.00
For Level 3 students, Brooklyn is neatly condensed between ranges 25-75% for grades 3-5 and has a more scattered distribution in the higher grades. You start to see more Queens and Staten Island observation points in the higher percentages in this plot. Brooklyn has a noticeable concentration of points in the higher percentage end in grades 6-8 and the Bronx is mostly concentrated on the low end of the percentage scale across all grades.
## school_data$Grade: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 6.70 12.26 17.90 100.00
## --------------------------------------------------------
## school_data$Grade: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 7.40 14.24 20.80 100.00
## --------------------------------------------------------
## school_data$Grade: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 7.40 13.76 20.00 100.00
## --------------------------------------------------------
## school_data$Grade: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 4.00 11.22 15.00 100.00
## --------------------------------------------------------
## school_data$Grade: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.300 9.165 10.800 100.000
## --------------------------------------------------------
## school_data$Grade: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.800 7.105 8.300 100.000
In Level 4 students, we see a lot more points representing Manhattan. There is a good concentration of Manhattan schools at the higher percentage end for all grades. Staten Island has dense concentration around 50-75% in grades 3-5 and more scattered distribution on the low percentage side in the higher grades.
Brooklyn has a high percentage of Level 4 students in grades 6-8 and Queens has high percentage of level 4 students across all grades. Now let us investigate the student level percentages across each grade for each year to see how students performed in each grade over time.
## school_data$Year: 2006
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 3.40 10.00 14.57 19.50 100.00
## --------------------------------------------------------
## school_data$Year: 2007
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.40 6.30 10.56 14.30 100.00
## --------------------------------------------------------
## school_data$Year: 2008
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 3.200 6.864 9.100 100.000
## --------------------------------------------------------
## school_data$Year: 2009
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 3.715 4.400 100.000
## --------------------------------------------------------
## school_data$Year: 2010
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 3.20 10.00 14.21 20.00 100.00
## --------------------------------------------------------
## school_data$Year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 2.4 8.3 12.8 17.5 100.0
## --------------------------------------------------------
## school_data$Year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.10 8.00 12.43 17.10 100.00
## --------------------------------------------------------
## school_data$Year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 18.90 36.60 37.99 54.80 100.00
## --------------------------------------------------------
## school_data$Year: 2014
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 16.90 33.30 36.11 52.60 100.00
## --------------------------------------------------------
## school_data$Year: 2015
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 16.70 33.30 35.67 52.00 100.00
2009 seems to be the best year for testing as there was a high concentration of low percentage of level 1 students. This is also supported by statistic information that shows that 75% of the observations in 2009 have less than 4.5% students who tested at Level 1. In 2013, when the test changed, there was a big shift in the data where the median percentage for students at Level 1 reaching almost 37%.
## school_data$Year: 2006
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 16.70 29.60 30.03 42.20 100.00
## --------------------------------------------------------
## school_data$Year: 2007
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 15.00 29.30 30.45 43.80 100.00
## --------------------------------------------------------
## school_data$Year: 2008
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 11.1 24.5 27.3 40.0 100.0
## --------------------------------------------------------
## school_data$Year: 2009
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 7.30 18.00 21.94 33.30 100.00
## --------------------------------------------------------
## school_data$Year: 2010
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 27.30 40.70 39.22 51.60 100.00
## --------------------------------------------------------
## school_data$Year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 25.50 39.70 38.47 51.10 100.00
## --------------------------------------------------------
## school_data$Year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 23.00 37.50 36.61 50.00 100.00
## --------------------------------------------------------
## school_data$Year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 27.3 36.0 35.7 44.4 100.0
## --------------------------------------------------------
## school_data$Year: 2014
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 25.90 34.70 34.58 42.90 100.00
## --------------------------------------------------------
## school_data$Year: 2015
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 25.60 34.30 34.06 42.90 100.00
The percentage of students at level 2 is pretty vast across all years except 2009. 8th graders seem to be the most prevalent illustrated by the plots.
6th graders had noticeably high percentages of students at level 2 in 2013 and 2014 as well.
## school_data$Year: 2006
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 32.80 46.20 44.33 57.10 100.00
## --------------------------------------------------------
## school_data$Year: 2007
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 35.30 48.90 46.72 59.60 100.00
## --------------------------------------------------------
## school_data$Year: 2008
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 42.00 55.20 53.04 66.70 100.00
## --------------------------------------------------------
## school_data$Year: 2009
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 49.08 61.00 58.59 71.40 100.00
## --------------------------------------------------------
## school_data$Year: 2010
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 22.20 32.80 32.61 42.70 100.00
## --------------------------------------------------------
## school_data$Year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 25.00 37.50 37.99 50.00 100.00
## --------------------------------------------------------
## school_data$Year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 26.10 38.00 38.71 50.90 100.00
## --------------------------------------------------------
## school_data$Year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 7.20 15.80 18.02 26.90 100.00
## --------------------------------------------------------
## school_data$Year: 2014
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 8.30 17.40 19.33 28.60 100.00
## --------------------------------------------------------
## school_data$Year: 2015
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 9.10 17.80 19.45 28.60 100.00
We can see large percentages of schools having majority of their students reach level 3 until the downturn in 2013. From the statistical information, we can see the downturn being very drastic. 3rd and 7th graders seem to have dealt with the Common Core change in the test the better than the other grades.
## school_data$Year: 2006
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 5.30 11.08 15.40 100.00
## --------------------------------------------------------
## school_data$Year: 2007
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 5.40 12.27 17.40 100.00
## --------------------------------------------------------
## school_data$Year: 2008
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 6.0 12.8 18.0 100.0
## --------------------------------------------------------
## school_data$Year: 2009
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 8.30 15.76 23.50 100.00
## --------------------------------------------------------
## school_data$Year: 2010
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.80 7.90 13.97 20.20 100.00
## --------------------------------------------------------
## school_data$Year: 2011
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 3.40 10.73 14.30 100.00
## --------------------------------------------------------
## school_data$Year: 2012
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 4.30 12.25 16.70 100.00
## --------------------------------------------------------
## school_data$Year: 2013
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 3.050 8.289 11.100 100.000
## --------------------------------------------------------
## school_data$Year: 2014
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 4.200 9.978 13.900 100.000
## --------------------------------------------------------
## school_data$Year: 2015
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 4.90 10.83 15.30 100.00
The percentage of students testing at level 4 reached its peak in 2009 where the IQR was 22.50 and median percentage was 8.3%. There are good amount of 6th and 7th grade classes who reached level 4 at a high percentage across all years.
We can see there was a big disruption in the distribution of test scores from 2013 when NYC decided to align more with the Common Core Standards. 8th graders seem to be having a difficult time with the standardized test as they dominate a large proportion in level 1 and 2 students. 6th and 7th graders seem to be doing the best over the years as they have large concentration of Level 3 and 4 students. 3rd and 6th graders seem to have weathered the change in Test standards in 2013. There is a dense concentration of Level 4 students in grades 3-5 from 2006-2009 but seems to dwindle after that. It would be interesting to see how students in each grade performance was in respect to each subject.
Performance seems to be good from grades 3-7, but falls drastically in 8th grade. Grades 4-6 seems to be the top performers in ELA. 8th graders seem to have a large concentration of Level 1 and 2 students in both subjects.
I have created another dataset which has the average percentages across all levels of students. Let’s analyze this dataset to see if it gives us any additional insight.
The first thing I notice is how the Level 1 and 4 student data is skewed and the Level 2 and 3 student data is fairly normalized. The average percent of Level 1 students seems to be similarly distributed across all boroughs, but there is more distinction in the Level 2 data. Queens and Staten Island’s mean are before 25 percent from 2006-2009, but align with the other boroughs’ distributions from 2010 onwards. The Bronx’s normal distribution slowly shifts right each year up until 2013, then the mean value shifts back left and stays there from 2013-2015. Manhattan’s mean oscillates left from 2006-2009 then to the right from 2010-2012 and stabilizes from 2013-2015.
In the Level 3 Data, we can see why 2009 seems to be the best year across them all as there are a large count of schools that have average percent of greater than 50% across all boroughs. The trend shows that 2006-2009 had high average percentages of Level 3 students, but that trend decreases each year afterwards.
The average percentage of Level 4 students is below 25% for all boroughs across all years.
3-5th graders had a larger number of schools with low percentage averages for Level 1 students and Level 2 students (< 50%). 3-5th graders had a high number of high percentage averages of Level 3 students. The number of Level 4 students with averages percentages greater than 25% are larger in 3rd-5th graders than 6-8th graders.
6-8th grade students had a large number of schools with average percentages around 50% for Level 2 students and average percentages around 20-35% for Level 3 students.
It is shown that 2009 had the highest median percentage value of proficient students over all the years where the IQR for 2009 falls between 63% and 89%.
From 2006-2009 the median percentage of proficient students lies above 50% and falls below 50% from 2010-2011. After the sharp decline in 2013, the median percentage value reaches 25% in 2015. From 2013-2015, 75% of the schools have less than a 50% proficient students according to the standards.
The trends for Black and Hispanic students are roughly the same. The trends for White and Asian students also follow the same pattern. If we look at the trend lines for Black and Hispanic students vs. White and Asian Students, we can see the interception point occurring around the 80% proficient mark.
The IQR for Manhattan falls almost exactly between 25 and 75% giving it the largest IQR across all boroughs. 75% of the Bronx schools have less than a 55% proficient rate for its students.
Analyzing the NYC Test data gave me quite some insight about their education system. Given that my fiancée and my 11 year old stepson reside in the Bronx, I had a lot of interest in finding out how well the test scores were in the Bronx borough. Although I was not pleased with my findings, I’m was not surprised. The Bronx is one of the most impoverished and economically challenged boroughs in New York City. I ran into a roadblock when I wanted to try to correlate the Borough Test Data with the budget funding for each school to see if there was any relationship with lower test scores in lower funded schools. There is no dataset provided and no API to access and retrieve this information. To retrieve this information, I would have to perform some Screen Scraping tasks to generate this info using the way the HTML accesses this information. To do this for tens of thousands of schools didn’t seem like the best thing to do given the time constraint. I also ran into a roadblock when I noticed that the data type for the percentages in the data from 2006-2012 was different than the percentages in 2012-2013. Thus I had to write separate functions to wrangle the datasets before combining them. My final roadblock was when I discovered that the Mean Scores varied over years, thus it would not be a reliable metric to measure when analyzing data over the years. This is why I stop analyzing it once I combined the earlier data with the 2013-2015 dataset. I did have success with the many ways the data was broken down (Gender, Race, Borough, English Learner, Grade, etc.) which allowed me to look at the data through different categories. For future work, I would like to look at the top 10 performing schools and the bottom 10 performing schools and see if there is any insight that I can find as to why the top 10 schools performs so well and the bottom 10 are not doing so well. This could be analyzed on Borough-by-borough basis or grade level basis. Also, there seem to be a breakdown on test scores once students reached 8th grade that would need some further investigation as well. Finally, I would like to get more insight as to why Black and Hispanic students are not performing as well as their White and Asian student counterparts. This is very important to me as an African-American male who faced the same disconnect when growing up in the city of Atlanta, Georgia.